0%

(ICCV 2015) Bilinear CNN Models for Fine-grained

Keyword [Bilinear CNN]

Lin T Y, RoyChowdhury A, Maji S. Bilinear cnn models for fine-grained visual recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1449-1457.


1. Overview


人类大脑双流假说 (two-streams hypothesis). 假说认为大脑中有两种视觉系统

  • 腹流(ventral stream; what pathway). 参与物体识别
  • 背流(dorsal stream; where pathway). 处理物体相对于viewer的空间位置

基于上述假说,论文提出bilinear模型,该模型可end-to-end训练,有助于fine-grained分类问题。



模型分为两条stream,分别负责

  • localization (where) [part detector]
  • appearance modeling (what) [feature extractor]
    但最终实验表明两条stream并没有明显的界限,它们都趋向于激活特定的semantic part.



2. 计算过程


得到两条stream输出的特征图后(h, w, c1), (h, w, c2)

  • 首先,对应空间点进行外积操作,从而实现part-feature interaction. (h, w, c1*c2)
  • 其次,进行sum-pooling操作. (c1*c2)
  • 接着,进行signed square-rootL2归一化


  • 最后,分类



3. 数据集


  • (bird) CUB-200-2011. 11788张图片,200种鸟类
  • (aircraft) FGVC-aircraft. 10000张图片,100中飞机类型
  • (car) Cars. 16185张图片,196种车类型

4. 实验结果